Beta-Binomial (betabinom) Distribution#
The beta-binomial distribution models a count of successes in a fixed number of trials when the success probability is unknown and varies across repeated experiments.
A clean way to say it:
Draw a random probability \(p \sim \mathrm{Beta}(\alpha, \beta)\), then draw successes \(X\mid p \sim \mathrm{Binomial}(n, p)\).
The marginal distribution of \(X\) is beta-binomial.
It is a standard choice when binomial data are overdispersed (more variable than a binomial model allows).
Learning goals#
Understand the hierarchical story behind
betabinomand when to use itWrite the PMF/CDF, compute moments, and interpret parameters \((n,\alpha,\beta)\)
Derive mean/variance from first principles
Sample and visualize the distribution (NumPy-only sampler)
Use
scipy.stats.betabinomfor PMF/CDF/RVS and implement a practical MLE “fit” routine
import math
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots
from scipy import optimize, special, stats
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=6, suppress=True)
rng = np.random.default_rng(42)
import plotly
import scipy
print("numpy :", np.__version__)
print("pandas:", pd.__version__)
print("scipy :", scipy.__version__)
print("plotly:", plotly.__version__)
numpy : 1.26.2
pandas: 2.1.3
scipy : 1.15.0
plotly: 6.5.2
Prerequisites#
Binomial distribution basics: counts of successes in \(n\) trials
Beta distribution basics: a distribution on probabilities \(p \in (0,1)\)
Law of total expectation/variance
Comfort reading math in LaTeX
1) Title & classification#
Name:
betabinom(beta-binomial)Type: Discrete distribution
Support: $\( \mathcal{X} = \{0, 1, 2, \dots, n\} \)$
Parameter space: $\( n \in \{0,1,2,\dots\},\qquad \alpha > 0,\qquad \beta > 0 \)$
SciPy uses the names (n, a, b) for (n, \alpha, \beta).
2) Intuition & motivation#
What it models#
A binomial model assumes a single fixed success probability \(p\).
In many real settings, \(p\) is not constant:
users vary in propensity to click
batches vary in defect rate
clinics vary in response rate
A simple way to model this heterogeneity is to treat \(p\) as random:
The marginal distribution of \(X\) is beta-binomial.
Real-world use cases#
Overdispersed binomial counts: when sample variance exceeds \(n\hat p(1-\hat p)\)
Empirical Bayes for rates: estimate a prior over \(p\) across many groups
Posterior predictive for binomial + beta prior (conjugate Bayesian modeling)
Relations to other distributions#
Mixture: beta-binomial is a Beta–Binomial compound distribution (a mixture of binomials)
Conjugacy: Beta is conjugate prior for Binomial; beta-binomial is the predictive distribution
Pólya’s urn: beta-binomial counts appear as draws in an urn scheme with reinforcement
Dirichlet-multinomial: multivariate generalization (counts across multiple categories)
3) Formal definition#
Hierarchical definition#
PMF#
For \(k \in \{0,1,\dots,n\}\):
where \(B(\cdot,\cdot)\) is the beta function:
A numerically stable computation uses the log form:
CDF#
The CDF is the cumulative sum of the PMF:
There is no simple elementary closed form; libraries compute it efficiently and stably.
def betabinom_logpmf(k: np.ndarray, n: int, a: float, b: float) -> np.ndarray:
"""Log PMF using stable special functions (vectorized).
Parameters
----------
k : array-like
Counts in {0, ..., n}.
n : int
Number of trials.
a, b : float
Beta shape parameters (alpha, beta) > 0.
"""
k = np.asarray(k)
log_choose = (
special.gammaln(n + 1)
- special.gammaln(k + 1)
- special.gammaln(n - k + 1)
)
return log_choose + special.betaln(k + a, n - k + b) - special.betaln(a, b)
def betabinom_pmf(k: np.ndarray, n: int, a: float, b: float) -> np.ndarray:
return np.exp(betabinom_logpmf(k, n=n, a=a, b=b))
def betabinom_cdf(k: np.ndarray, n: int, a: float, b: float) -> np.ndarray:
"""CDF computed by cumulative sum of PMF on support."""
ks = np.arange(0, n + 1)
pmf = betabinom_pmf(ks, n=n, a=a, b=b)
cdf_full = np.cumsum(pmf)
k = np.asarray(k)
k_clip = np.clip(np.floor(k).astype(int), 0, n)
return cdf_full[k_clip]
def betabinom_entropy(n: int, a: float, b: float) -> float:
"""Shannon entropy in nats: H = -sum p log p."""
ks = np.arange(0, n + 1)
logp = betabinom_logpmf(ks, n=n, a=a, b=b)
p = np.exp(logp)
return float(-np.sum(p * logp))
def betabinom_stats_closed_form(n: int, a: float, b: float):
"""Mean, variance, skewness, excess kurtosis (Fisher)."""
s = a + b
p = a / s
q = b / s
mean = n * p
var = n * p * q * (n + s) / (s + 1)
# For n=0 the distribution is degenerate at 0; higher standardized moments are undefined.
if n == 0 or var == 0:
return float(mean), float(var), np.nan, np.nan
skew = (s + 2 * n) * (b - a) * np.sqrt(s + 1) / ((s + 2) * np.sqrt(n * a * b * (s + n)))
t = a * b
kurt_excess = (
(s + 1)
* (
s**4
+ (6 * n - 1) * s**3
+ (6 * n**2 + 3 * t * (n - 2)) * s**2
- 3 * t * n * (6 - n) * s
- 18 * t * n**2
)
/ (n * t * (s + n) * (s + 2) * (s + 3))
- 3
)
return float(mean), float(var), float(skew), float(kurt_excess)
def rvs_betabinom_numpy(n: int, a: float, b: float, size: int, rng: np.random.Generator) -> np.ndarray:
"""NumPy-only sampler via Gamma->Beta + Binomial.
- If G1 ~ Gamma(a, 1) and G2 ~ Gamma(b, 1), then P = G1 / (G1 + G2) ~ Beta(a, b).
- Then sample X ~ Binomial(n, P).
"""
g1 = rng.gamma(shape=a, scale=1.0, size=size)
g2 = rng.gamma(shape=b, scale=1.0, size=size)
p = g1 / (g1 + g2)
return rng.binomial(n, p)
4) Moments & properties#
Let \(X \sim \mathrm{BetaBinomial}(n,\alpha,\beta)\) and define \(s = \alpha+\beta\).
Mean#
Variance#
This is often written as: $\( \mathrm{Var}(X) = n\,p(1-p)\,\frac{n+s}{s+1}, \qquad p=\frac{\alpha}{s}. \)$
Compared to binomial variance \(n p(1-p)\), the factor \((n+s)/(s+1) \ge 1\) produces overdispersion.
Skewness#
Excess kurtosis (Fisher)#
A symmetric closed form is (let \(t=\alpha\beta\)):
PGF / MGF / characteristic function#
The probability generating function (PGF) can be written using the Gauss hypergeometric function:
Then:
MGF: \(M(t)=\mathbb{E}[e^{tX}] = G(e^t)\)
CF: \(\varphi(t)=\mathbb{E}[e^{itX}] = G(e^{it})\)
Because \(X \in \{0,\dots,n\}\) is bounded, all moments exist.
Entropy#
There is no simple closed form in elementary functions; it is typically computed numerically:
Intraclass correlation (trial-to-trial dependence)#
In the hierarchical view, the Bernoulli trials are conditionally independent given \(p\), but marginally correlated.
The correlation between two distinct trials in the same group is:
Binomial data correspond to the limiting case \(\rho \to 0\) (equivalently \(s \to \infty\)).
# Quick numerical check: closed-form moments vs SciPy
n, a, b = 10, 2.0, 3.0
mean_cf, var_cf, skew_cf, kurt_cf = betabinom_stats_closed_form(n, a, b)
mean_sp, var_sp, skew_sp, kurt_sp = stats.betabinom.stats(n, a, b, moments="mvsk")
pd.DataFrame(
{
"closed_form": [mean_cf, var_cf, skew_cf, kurt_cf],
"scipy": [float(mean_sp), float(var_sp), float(skew_sp), float(kurt_sp)],
},
index=["mean", "variance", "skewness", "excess_kurtosis"],
)
| closed_form | scipy | |
|---|---|---|
| mean | 4.000000 | 4.000000 |
| variance | 6.000000 | 6.000000 |
| skewness | 0.291606 | 0.291606 |
| excess_kurtosis | -0.690476 | -0.690476 |
# Entropy (numerical) + PGF/MGF evaluation via hypergeometric function
H = betabinom_entropy(n=10, a=2.0, b=3.0)
def betabinom_pgf(s: complex, n: int, a: float, b: float) -> complex:
return complex(special.hyp2f1(-n, a, a + b, 1 - s))
M0 = betabinom_pgf(1.0, n=10, a=2.0, b=3.0) # should be 1
M_small = betabinom_pgf(np.exp(0.2), n=10, a=2.0, b=3.0)
H, M0, M_small
(2.2577232124962414, (1+0j), (2.518957094288793+0j))
5) Parameter interpretation#
\(n\) (trials)#
Controls the maximum number of successes.
Scaling \(n\) changes the range and (often) the granularity of the PMF.
\((\alpha,\beta)\) (Beta prior over \(p\))#
Interpretations:
Prior mean of the success probability: $\( \mathbb{E}[p] = \frac{\alpha}{\alpha+\beta} \)$
Prior “concentration” (how strongly \(p\) is concentrated around its mean): $\( s = \alpha+\beta \)\( Larger \)s\( means **less heterogeneity** in \)p\( and a distribution closer to Binomial\)(n,\mathbb{E}[p])$.
Heuristic pseudo-count view: \(\alpha-1\) prior successes and \(\beta-1\) prior failures.
Shape changes#
Fix \(p=\alpha/(\alpha+\beta)\) and increase \(s\): PMF tightens around \(np\) (approaches binomial).
Fix \(s\) and vary \(p\): the mass shifts left/right.
If \(\alpha,\beta<1\) the Beta prior is U-shaped, so \(p\) often lands near 0 or 1 → PMF puts more mass near 0 and \(n\).
# Effect of concentration s = a+b at fixed mean p
n = 20
p = 0.30
concentrations = [2, 5, 20, 200]
ks = np.arange(n + 1)
frames = []
for s in concentrations:
a = p * s
b = (1 - p) * s
pmf = stats.betabinom.pmf(ks, n, a, b)
frames.append(
pd.DataFrame(
{
"k": ks,
"pmf": pmf,
"concentration (a+b)": f"{s:g}",
}
)
)
df = pd.concat(frames, ignore_index=True)
fig = px.line(
df,
x="k",
y="pmf",
color="concentration (a+b)",
markers=True,
title="Beta-binomial PMF: increasing concentration approaches a Binomial",
labels={"k": "successes k", "pmf": "P(X=k)"},
)
fig.show()
# Effect of changing mean p at fixed concentration s
n = 20
s = 10
ps = [0.1, 0.3, 0.5, 0.7]
ks = np.arange(n + 1)
frames = []
for p in ps:
a = p * s
b = (1 - p) * s
pmf = stats.betabinom.pmf(ks, n, a, b)
frames.append(pd.DataFrame({"k": ks, "pmf": pmf, "p": f"{p:.1f}"}))
df = pd.concat(frames, ignore_index=True)
fig = px.line(
df,
x="k",
y="pmf",
color="p",
markers=True,
title="Beta-binomial PMF: varying mean p shifts the mass",
labels={"k": "successes k", "pmf": "P(X=k)", "p": "mean p"},
)
fig.show()
6) Derivations#
Expectation#
Use the law of total expectation:
Variance#
Use the law of total variance:
For a binomial conditional on \(p\):
\(\mathbb{E}[X\mid p]=np\)
\(\mathrm{Var}(X\mid p)=np(1-p)\)
So:
Using Beta moments:
After algebra, this yields:
Likelihood#
For a single observation \(k\) (with fixed \(n\)), the likelihood is just the PMF viewed as a function of \((\alpha,\beta)\):
For i.i.d. observations \(k_1,\dots,k_m\) (same \(n\)), the log-likelihood is:
In practice we compute this in log-space (e.g., using scipy.special.betaln).
# Monte Carlo check of mean/variance
n, a, b = 30, 2.0, 5.0
x = rvs_betabinom_numpy(n=n, a=a, b=b, size=200_000, rng=rng)
mean_mc = float(x.mean())
var_mc = float(x.var(ddof=0))
mean_cf, var_cf, *_ = betabinom_stats_closed_form(n, a, b)
pd.DataFrame(
{
"monte_carlo": [mean_mc, var_mc],
"closed_form": [mean_cf, var_cf],
},
index=["mean", "variance"],
)
| monte_carlo | closed_form | |
|---|---|---|
| mean | 8.572700 | 8.571429 |
| variance | 28.446845 | 28.316327 |
7) Sampling & simulation#
The beta-binomial sampler is a direct implementation of the hierarchical model:
Sample a probability \(p \sim \mathrm{Beta}(\alpha,\beta)\)
Sample a count \(X \sim \mathrm{Binomial}(n,p)\)
To keep this NumPy-only, we sample \(p\) via the Gamma ratio construction:
Then we sample \(X\sim\mathrm{Binomial}(n,p)\).
n, a, b = 20, 1.5, 3.0
samples = rvs_betabinom_numpy(n=n, a=a, b=b, size=20_000, rng=rng)
samples[:10], samples.mean(), samples.var(ddof=0)
(array([ 2, 11, 11, 7, 0, 4, 5, 5, 7, 12]), 6.65555, 19.7927041975)
8) Visualization#
We’ll plot:
the PMF on \(\{0,\dots,n\}\)
the CDF as a cumulative sum
a Monte Carlo approximation (histogram of samples)
n, a, b = 20, 1.5, 3.0
ks = np.arange(n + 1)
pmf = stats.betabinom.pmf(ks, n, a, b)
cdf = stats.betabinom.cdf(ks, n, a, b)
fig = make_subplots(rows=1, cols=2, subplot_titles=["PMF", "CDF"])
fig.add_trace(go.Bar(x=ks, y=pmf, name="PMF"), row=1, col=1)
fig.add_trace(go.Scatter(x=ks, y=cdf, mode="lines+markers", name="CDF"), row=1, col=2)
fig.update_xaxes(title_text="k (successes)", row=1, col=1)
fig.update_xaxes(title_text="k (successes)", row=1, col=2)
fig.update_yaxes(title_text="P(X=k)", row=1, col=1)
fig.update_yaxes(title_text="P(X\u2264k)", row=1, col=2)
fig.update_layout(title=f"Beta-binomial(n={n}, a={a}, b={b})", showlegend=False)
fig.show()
# Monte Carlo histogram vs PMF
samples = rvs_betabinom_numpy(n=n, a=a, b=b, size=50_000, rng=rng)
counts = np.bincount(samples, minlength=n + 1)
pmf_mc = counts / counts.sum()
fig = go.Figure()
fig.add_trace(go.Bar(x=ks, y=pmf_mc, name="Monte Carlo", opacity=0.6))
fig.add_trace(go.Scatter(x=ks, y=pmf, mode="lines+markers", name="True PMF"))
fig.update_layout(
title="Beta-binomial: empirical frequency vs true PMF",
xaxis_title="k (successes)",
yaxis_title="probability",
)
fig.show()
9) SciPy integration (scipy.stats.betabinom)#
SciPy provides:
pmf,logpmfcdfrvsstats(mean/var/skew/kurt)
About .fit#
As of SciPy 1.15, scipy.stats.betabinom (a discrete distribution) does not expose a built-in .fit method.
In practice you can fit \((\alpha,\beta)\) by maximizing the log-likelihood with scipy.optimize.
Below is a minimal but usable MLE routine that assumes known \(n\).
n, a, b = 20, 1.5, 3.0
ks = np.arange(n + 1)
pmf_sp = stats.betabinom.pmf(ks, n, a, b)
cdf_sp = stats.betabinom.cdf(ks, n, a, b)
x_sp = stats.betabinom.rvs(n, a, b, size=10, random_state=rng)
pmf_sp[:5], cdf_sp[:5], x_sp
(array([0.051885, 0.070752, 0.080018, 0.084018, 0.084571]),
array([0.051885, 0.122637, 0.202655, 0.286674, 0.371245]),
array([1, 0, 1, 4, 2, 8, 6, 6, 8, 6]))
def betabinom_moments_init(data: np.ndarray, n: int):
"""Method-of-moments style initializer for (a, b).
If empirical variance <= binomial variance, we return a large concentration
(close to a binomial model).
"""
data = np.asarray(data)
mu = float(data.mean())
v = float(data.var(ddof=0))
p = np.clip(mu / n, 1e-6, 1 - 1e-6)
v_bin = n * p * (1 - p)
if v <= v_bin * (1 + 1e-6):
s = 1e6
else:
r = v / v_bin
s = (n - r) / (r - 1)
s = max(s, 1e-3)
a0 = p * s
b0 = (1 - p) * s
return float(a0), float(b0)
def fit_betabinom_mle(data: np.ndarray, n: int):
"""Fit (a,b) by MLE with fixed n.
Uses an unconstrained parameterization a=exp(theta0), b=exp(theta1).
"""
data = np.asarray(data)
if data.ndim != 1:
raise ValueError("data must be 1D")
if np.any((data < 0) | (data > n)):
raise ValueError("data must be in {0,...,n}")
if not np.issubdtype(data.dtype, np.integer):
# allow float arrays with integer values
if not np.all(np.isclose(data, np.round(data))):
raise ValueError("data must be integer-valued")
data = np.round(data).astype(int)
a0, b0 = betabinom_moments_init(data, n=n)
x0 = np.log([a0, b0])
def nll(theta):
a = float(np.exp(theta[0]))
b = float(np.exp(theta[1]))
return -float(np.sum(stats.betabinom.logpmf(data, n, a, b)))
res = optimize.minimize(nll, x0=x0, method="L-BFGS-B")
a_hat, b_hat = np.exp(res.x)
return {
"a_hat": float(a_hat),
"b_hat": float(b_hat),
"success": bool(res.success),
"nll": float(res.fun),
"message": res.message,
}
# Demo: recover parameters from synthetic data
rng_fit = np.random.default_rng(123)
n_true, a_true, b_true = 25, 2.0, 5.0
data = stats.betabinom.rvs(n_true, a_true, b_true, size=2_000, random_state=rng_fit)
fit = fit_betabinom_mle(data, n=n_true)
fit
{'a_hat': 2.041290796966522,
'b_hat': 5.036609641147764,
'success': True,
'nll': 5706.819056152122,
'message': 'CONVERGENCE: RELATIVE REDUCTION OF F <= FACTR*EPSMCH'}
10) Statistical use cases#
Hypothesis testing (binomial vs overdispersed alternative)#
If you have repeated groups with the same \(n\) (e.g., 100 clinics each with \(n\) patients), a binomial model assumes all groups share a single \(p\).
Beta-binomial is a common alternative when the observed variability across groups is too large.
Practical approaches:
Model comparison: compare binomial vs beta-binomial by log-likelihood, AIC/BIC, or held-out predictive performance.
Overdispersion testing: parameterize by mean \(p\) and intraclass correlation \(\rho=1/(\alpha+\beta+1)\). The null is \(\rho=0\) (binomial), which is a boundary case, so classical LRT chi-square theory needs care; a parametric bootstrap is often used.
Bayesian modeling#
With prior \(p\sim\mathrm{Beta}(\alpha_0,\beta_0)\) and data \(x\) successes out of \(n\) trials:
Posterior: \(p\mid x \sim \mathrm{Beta}(\alpha_0+x,\;\beta_0+n-x)\)
Posterior predictive for \(m\) future trials: $\( Y\mid x \sim \mathrm{BetaBinomial}(m,\;\alpha_0+x,\;\beta_0+n-x) \)$
Generative modeling#
Beta-binomial is a compact generative model for count data with bounded support when you expect latent heterogeneity in probabilities.
Examples:
modeling per-user conversions out of impressions
modeling per-batch defects out of inspected items
modeling per-class accuracy counts out of attempts (with varying difficulty)
# Model comparison example: Binomial vs Beta-binomial on synthetic overdispersed data
rng_use = np.random.default_rng(999)
n = 30
a_true, b_true = 1.2, 3.5
data = stats.betabinom.rvs(n, a_true, b_true, size=500, random_state=rng_use)
# Binomial MLE: p_hat = mean/n
p_hat = float(data.mean() / n)
ll_binom = float(np.sum(stats.binom.logpmf(data, n, p_hat)))
aic_binom = 2 * 1 - 2 * ll_binom
# Beta-binomial MLE (a,b), fixed n
fit = fit_betabinom_mle(data, n=n)
ll_bb = -fit["nll"]
aic_bb = 2 * 2 - 2 * ll_bb
pd.DataFrame(
{
"model": ["binomial", "beta-binomial"],
"log_likelihood": [ll_binom, ll_bb],
"AIC": [aic_binom, aic_bb],
}
)
| model | log_likelihood | AIC | |
|---|---|---|---|
| 0 | binomial | -2510.347071 | 5022.694142 |
| 1 | beta-binomial | -1527.653209 | 3059.306418 |
# Bayesian posterior predictive: Beta prior + Binomial likelihood -> Beta posterior -> Beta-binomial predictive
rng_bayes = np.random.default_rng(2024)
# Prior on p
a0, b0 = 2.0, 2.0
# Observed data: x successes out of n_obs
n_obs, x_obs = 20, 8
a_post, b_post = a0 + x_obs, b0 + (n_obs - x_obs)
# Predict m future trials
m = 15
ks = np.arange(m + 1)
pmf_pred = stats.betabinom.pmf(ks, m, a_post, b_post)
# Monte Carlo posterior predictive simulation for comparison
p_samp = rng_bayes.beta(a_post, b_post, size=100_000)
y_samp = rng_bayes.binomial(m, p_samp)
pmf_mc = np.bincount(y_samp, minlength=m + 1) / len(y_samp)
fig = go.Figure()
fig.add_trace(go.Bar(x=ks, y=pmf_mc, name="posterior predictive (MC)", opacity=0.6))
fig.add_trace(go.Scatter(x=ks, y=pmf_pred, mode="lines+markers", name="beta-binomial closed form"))
fig.update_layout(
title="Posterior predictive: Beta-Binomial matches Monte Carlo",
xaxis_title="future successes k",
yaxis_title="probability",
)
fig.show()
11) Pitfalls#
Invalid parameters#
nmust be a nonnegative integer.aandb(i.e., \(\alpha,\beta\)) must be strictly positive.
SciPy will typically return nan for invalid parameters.
Numerical issues#
Naively computing \(\binom{n}{k}\) and \(B(\cdot,\cdot)\) can overflow/underflow for moderate/large \(n\).
Prefer
logpmf/betaln/gammalnand only exponentiate at the end.When comparing very small probabilities, compare log probabilities.
Estimation issues#
If data are close to binomial (little overdispersion), the MLE tends to push \(\alpha+\beta \to \infty\). Numerically, this can lead to large parameter estimates; using a mean/dispersion parameterization \((p,\rho)\) can be more stable.
# Invalid parameters in SciPy
ks = np.arange(6)
stats.betabinom.pmf(ks, n=5.5, a=1.0, b=2.0) # invalid n (not integer)
array([nan, nan, nan, nan, nan, nan])
# Underflow demonstration: compare pmf vs logpmf for larger n
n, a, b = 500, 2.0, 5.0
k = 250
pmf_direct = stats.betabinom.pmf(k, n, a, b)
logpmf = stats.betabinom.logpmf(k, n, a, b)
pmf_from_log = float(np.exp(logpmf))
pmf_direct, logpmf, pmf_from_log
(0.001878631572159702, -6.277211654327845, 0.001878631572159702)
12) Summary#
betabinomis a discrete distribution on \(\{0,\dots,n\}\).It arises as a Beta–Binomial mixture: \(p\sim\mathrm{Beta}(\alpha,\beta)\) then \(X\mid p\sim\mathrm{Binomial}(n,p)\).
Mean and variance: $\(\mathbb{E}[X]=n\frac{\alpha}{\alpha+\beta},\qquad \mathrm{Var}(X)=n\frac{\alpha\beta}{(\alpha+\beta)^2}\frac{n+\alpha+\beta}{\alpha+\beta+1}.\)$
Compared to a binomial with the same mean, beta-binomial allows overdispersion (extra variability) driven by heterogeneity in \(p\).
SciPy provides PMF/CDF/RVS; for fitting \((\alpha,\beta)\) you can maximize the log-likelihood with
scipy.optimize.
Suggested exercises#
Fix \(n\) and \(p\) and vary \(s=\alpha+\beta\): quantify how quickly the beta-binomial approaches a Binomial.
Derive factorial moments \(\mathbb{E}[(X)_r]\) using the hierarchical model.
Reparameterize by \((p,\rho)\) with \(\rho=1/(\alpha+\beta+1)\) and fit in that space.